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Abstract 

The CLARO-CMOS is an application specific integrated circuit (ASIC) 
designed for fast photon counting with pixellated photodetectors such 
as multi-anode photomultiplier tubes (Ma-PMT), micro-channel plates 
(MCP), and silicon photomultipliers (SiPM). The first prototype has four 
channels, each with a charge sensitive amplifier with settable gain and a 
discriminator with settable threshold, providing fast hit information for 
each channel independently. The design was realized in a long-established, 
stable and inexpensive 0.35 /im CMOS technology, and provides outstand- 
ing performance in terms of speed and power dissipation. The prototype 
consumes less than 1 mW per channel at low rate, and less than 2 mW at 
an event rate of 10 MHz per channel. The recovery time after each pulse 
is less than 25 ns for input signals within a factor of 10 above thresh- 
old. Input referred RMS noise is about 7.7 ke~ (1.2 fC) with an input 
capacitance of 3.3 pF. Thanks to the low noise and high speed, a timing 
resolution down to 10 ps RMS was measured for typical photomultiplier 
signals of a few million electrons, corresponding to the single photon re- 
sponse for these detectors. 

Keywords: Pixelated detectors and associated VLSI electronics; Cherenkov 
detectors; Instrumentation and methods for time-of-flight (TOF) spectroscopy; 
Analogue electronic circuits; Front-end electronics for detector readout; VLSI 
circuits 
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1 Introduction 



The fast counting of photons down to the single photon level is a basic require- 
ment shared among several applications, ranging from particle identification in 
fundamental physics to imaging of biological processes in nuclear medicine. In 
many cases the applications require pixellated photodetectors with pixel size of 
the order of a few squared millimeters, often placed side by side to increase the 
total photosensitive area. The total number of pixels can be very large, ranging 
up to the order of 10 5 . The case of ring imaging Cherenkov (RICH) detectors 
is one of the most demanding. In this case, the photodetectors are usually ar- 
ranged to form planes of up to a few squared meters, ideally with no dead space 
between pixels. 

Among the photodetectors which may be employed, multi-anode photomul- 
tiplier tubes (Ma-PMT), thanks to the negligible dark count rate, are most often 
the baseline for RICH detectors. New time of flight (TOF) detector designs of- 
ten employ light sensors with superior time resolution, such as microchannel 
plates (MCP). Scintillator-based detectors usually generate a larger number of 
photons per event, and can thus take advantage of light detectors with a higher 
dark count rate, but lower cost, such as silicon photomultipliers (SiPM). From 
the point of view of the readout electronics, the signals from these photodetec- 
tors have very similar characteristics, and the same readout circuits can be used 
with minor adjustments. The typical photomultiplier gain is of the order of 
10 6 , and the expected pixel capacitance is of the order of a few pF. The charge 
collection time is small, of the order of one nanosecond. Other kind of photo- 
sensors exist which require different design solutions for the readout electronics, 
but are not considered here. 

The main challenges in the realization of the electronic readout of such sys- 
tems stem from the large number and close packing of the readout channels. 
This requires a low power dissipation to minimize cooling issues. Other frequent 
requirements are the sustainability of high count rates or the allowance of pre- 
cise timing measurements. These call for wide bandwidth, which is in contrast 
with low power dissipation. Wide bandwidth also requires the minimization of 
the capacitance between pixels, which can be a major source of crosstalk. To 
mimimize capacitance the front end electronics must be as close as possible to 
the photosensors, which also helps in minimizing noise. But this poses design 
issues which go back to power dissipation and cooling. These trade-offs need to 
be tuned to the specificities of each application. 

Several application specific integrated circuits (ASIC) for photodetector read- 
out are already available, covering a wide range of applications [IJ [21 El IH H3 EH E] ■ 
For instance, an ASIC suitable for timing measurements with a resolution of 
20 ps RMS is the NINO [J, designed in IBM 0.25 /im CMOS technology, with 
a power consumption of 27 mW per channel. On the other side, an ASIC for fast 
photon counting with a lower power consumption is the MAROC [2], designed 
in AMS 0.35 /im SiGe-BiCMOS technology, which consumes about 5 mW per 
channel but was not designed for precise timing measurements. 

The technological advances driven by the field of digital electronics and of 



2 




Figure 1: A photograph of the 4-channel CLARO-CMOS prototype. 

commercial portable communication devices, which also require wide bandwidth 
at low power, can result in significant improvements in the field of fast photode- 
tector readout. Nevertheless careful design in a rather aged (and inexpensive) 
purely CMOS technology, such as the 0.35 /im from AMS, can still yield ex- 
cellent results at the cutting edge of timing performance and low power. This 
is the aim followed in the design of the CLARO-CMOS, the first prototype of 
an ASIC for photodetector readout presented in this paper. Figure [T] shows a 
photograph the 4-channel ASIC. The die area is 2 x 2 mm 2 . 

The radiation hardness of the technology adopted is expected to be ade- 
quate for most accelerator and space environments [HI [§] • However the effects 
of radiation on the circuit performance depend also on the design and layout of 
a given device. The radiation hardness of the CLARO-CMOS prototype will be 
measured in the near future, but is not considered in this paper. 

2 Design of the prototype 

Figure g] shows the block diagram of a channel of the CLARO-CMOS. The ASIC 
is designed for operation between a positive 2.5 V supply rail and ground. The 
charge sensitive amplifier (CSA) converts the input current pulse into a voltage 
signal, which is AC coupled to a PMOS follower and to a discriminator (a voltage 
comparator). The threshold of the discriminator is set by the programmable 
static voltage at the non-inverting input of the comparator. The schematics of 
the charge sensitive amplifier and of the comparator will be described in detail 
in the following. 

As will become clear later, the DC voltage at the output of the CSA is close 
to the positive rail and its value is not stable against temperature variations. 
For these reasons the AC coupling shown in figure [2] was introduced. In this 
way the DC voltage at the inverting input of the comparator is held at half-way 
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Figure 2: The block diagram of a CLARO-CMOS channel. 



between the positive rail and ground, and is independent of temperature. The 
AC coupling time constant is 55 ns. Since, as will be shown, the signals at the 
ouput of the CSA are very fast, no noticeable baseline shift is caused by the AC 
coupling unless the rate is larger than about 10 MHz. 

The auxiliary output buffer realized with a small area PMOS follower is 
primarily used for debugging purposes: it allows to measure the signals at the 
inverting input of the discriminator without loading the output of the CSA. It 
needs to be biased by an external resistor tied to the positive supply voltage, 
and is meant to be switched off during normal operation, when the threshold is 
properly set and only the binary information at the output of the discriminator is 
readout. In all the power consumption measurements presented in the following, 
the analog buffer was off. 

Gain and threshold are programmable thanks to a 16-bit shift register, very 
similar to a SPI interface. The first 8 bits control channel 1. Three bits are used 
to control the gain of the CSA, as will be described in the following, and the 
remaining five bits control the resistive divider at the noninverting input of the 
comparator. The second group of 8 bits controls channel 2. In this prototype, 
settings for channels 3 and 4 are copied from those of channels 1 and 2. 

The design of the CLARO-CMOS is optimized for negative input charge 
signals, that is, the ASIC is designed to be used with photodetectors where 
electrons are collected at the readout electrode. To accomodate the case where 
the photodetector signals are made of holes, as for some SiPM models, the 
same design could be reversed by changing all NMOS transistors with PMOS 
transistors and viceversa in the CSA, and threshold settings should be changed 
accordingly. 

2.1 Design of the CSA 

Figure [3] shows a simplified schematic of the CSA, which includes the parasitic 
capacitance Cl and the input capacitance Cj for clarity. The input stage is an 
active cascode [lOl [11] [12] , a design widely used in the field of photodetector 
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Figure 3: Simplified schematic of the charge sensitive amplifier (CSA). The 
parasitic capacitances Cp and Cj are shown in grey. C/ also includes the sensor 
capacitance. 

electronics, also referred to as super common base [T31 Q31 [J5]. This design 
uses a local feedback through iVi to lower the impedance at the source of N 2 , 
in order to read the input current pulses on a virtual ground node. The loop 
gain at intermediate frequency is giRc, where g\ is the transconductance of 
N\. The current pulses are integrated by the capacitor Cf at the drain of N 2 , 
which discharges through the resistor Rp. The output signal in response to a 
(negative) charge Q injected at t = is given by 



where tr is the rise time constant given by the CSA bandwidth and rp = CpRp 
is the fall time constant. The rise time constant tr is of the order of 1 ns, and 
is directly proportional to Cj as will be shown. The ASIC is designed for fast 
photodetectors, where the input current pulse is short, of the order of 1 ns. 
The fall time constant tf was chosen to be 5 ns, large enough for an effective 
integration of fast pulses but small enough to sustain high rates without pile-up. 

In the simplified scheme of figure U the main voltage (or series) noise source 
is N\ together with the bias circuit I\ , while the main current (or parallel) noise 
source is Rp together with the bias circuit I 2 . Transistor N 2 contributes to 
the series noise, but its contribution is divided by the loop gain and becomes 
negligible. The optimal noise performance corresponds to the case where N± is 
biased with a large current I\ to keep its transconductance high and its series 
noise low. Since Rf contributes to the parallel noise, its value cannot be too 
small, and this poses an upper limit to the bias current I 2 of N 2 . With a low 
bias current, the transconductance g 2 of N 2 is low, and the input capacitance 
to ground Cj due to the input bonding pad, the bonding wire, packaging, inter- 
connects and to the sensor adds a pole to the input feedback loop at a frequency 
g 2 /2-KCi. If Rc and Cc were not present, the load at the drain of N\ would be 
purely capacitive, and there would be another pole at very low frequency due 
to Cl. This would be the lower frequency pole of the feedback loop. At the 
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Figure 4: Bode diagram illustrating the stability of the input feedback loop. 



frequency of the second pole, that is g 2 /2TrCi, the feedback loop would become 
unstable, unless it were already lower than 1, in which case it would be ineffec- 
tive in lowering the input impedance at this frequency. This case is illustrated 
in the bode plot of figure [4j dashed line. 

To compensate the pole due to Cl, Rc and Cc are used. This case is 
illustrated in the solid line of figure [4] The effect of compensation is to limit 
the loop gain to g\Rc at moderate frequency, higher than 1/2itCcRc- This 
shifts the pole due to Cl at a higher frequency given by 1/2-kClRc- For this 
compensation to be effective, it is required that the value of Rc is not too large 
and that Cl is minimized with a proper layout. In particular, since the area of 
Cc on silicon is larger than that of Rc, its parasitic capacitance to the substrate 
is larger. A much lower value for Cl is obtained is Rc is placed before Cc, as in 
figure [3j The relatively low value for Rc strengthens the need to keep high the 
transconductance of Ni, while the transconductance of N 2 is less critical. As 
illustrated in the solid line of figure |4j the dominant pole of the input feedback 
loop is now at g2/2wCj. This ensures that the feedback loop is effective in 
lowering the input inpedance up to a much higher frequency. The frequency 
where the loop gain becomes close to unity gives the bandwidth of the CSA. 
The associated time constant gives the rise time of the output signal: 
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The 10% to 90% rise time is given by 2.2tr. The rise time is thus directly 
proportional to the input capacitance Cr and inversely proportional to the loop 
gain g\Rc- The stability of the feedback loop is ensured even if the sensor has 
a negligible capacitance, since the value of Cr has a lower limit at a few pF due 
to the gate-drain capacitance of iVi, that is less than 100 fF but its contribu- 
tion is multiplied by the loop gain, its gate-source and gate-bulk capacitance 
(about 0.5 pF in total) and the stray capacitance of the pads, the bonding wires, 
eccetera. Considering all the contributions from the circuit the input capaci- 
tance can be estimated to be about 1.5 pF, bonding pads excluded. With the 
CLARO-CMOS mounted in a small QFN48 package the total capacitance at 
the input (without the sensor) was measured to be about 3.3 pF. 
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Figure 5: Full schematic of the CSA. The width of the MOS transistors is shown. 
The gate length is 0.35 (im for all the transistors in the CSA. The substrate of 
all NMOS (PMOS) transistors is tied to ground (to the positive rail). 

The full schematic of the CSA is shown in figure [5| To vary the gain, a set 
of MOS switches was included in the design. Two switches, Ng 3 and Ns4,, are 
used to attenuate the input signal: if the digital control signals V3 or V4 are set 
high, the switches are closed and a part of the input charge passes through A3 
or A4 and is wasted on the positive rail. The amount of attenuation B is set 
by choosing the dimensions of A3 and A4, which are 3 and 6 times larger than 
A2 respectively, causing attenuations of B = 4 and B = 7. An attenuation 
of a factor of B = 10 is obtained if both branches are enabled. The dummy 
switch A52 whose gate is tied to the positive rail was introduced to preserve the 
simmetry between the input branches. 

Another switch Psf controlled by the digital control signal Vf is used to 
change the value of Cf and Rf, doubling C'f and halving Rp, to change the 
gain by a factor of 2 while keeping the discharge time constant the same. The 
voltages V3, V4 and Vf are the three control bits which allow gain setting on 
each channel. The reason why only one switch was used to change the values of 
Cf and Rf is related to the switch parasitics. If several switches were connected 
in series, their series resistance in the "on" state would have caused distortion in 
the shape of the output signal. If several of such switches were put in parallel, 
their capacitance in the "off" state would have been in parallel with Cp, reducing 
the maximum gain achievable. 

The dimensions of the bias transistors Nbi ■ ■ ■ Nb5 were chosen so that the 
bias current of Ni is about 2.5 times larger than that of N%. Transistor N\ has 
a very large area to obtain a high transconductance g%. In this prototype the 
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bias current of the CSA can be set by changing Ia with an external resistor. 
Two operating modes were chosen: a "low power" mode, with I a = 2 /iA, and a 
"timing" mode, with I a = 5 /iA. In "low power" mode, N\ is biased with 85 /iA, 
resulting in gi = 2 mA/V. Since Rc = 10 kf2, the low frequency gain of the 
input feedback loop is about 20. The input branches with N 2 , N$, N4 are biased 
with a total current 25 /iA. The total transconductance of N2 in parallel with 
iV 3 and N4 is about 350 /LtA/V, depending on which of N 3 and N4 are enabled. 
If the feedback loop were not present, the input impedance would be higher 
than 2 kf2. The feedback loop lowers this value to about 130 fi. From equation 
[2]thc 10% to 90% rise time is expected to be about 1.2 ns for Cj = 3.3 pF, and 
2.4 ns for Cj = 6.5 pF. In "timing" mode, Ni is biased with 170 /iA, and its 
transconductance becomes 3.8 mA/V, so that the loop gain roughly doubles. 
The total transconductance of N2, N3 and N4 is about 500 /iA/V. Thanks to 
the larger loop gain, the input inpedance is now reduced to less than 100 O. The 
bandwidth of the CSA is increased, and the loop gain at 1/2-kClRl becomes 
closer to unity, but stability of the feedback loop is still ensured even with a 
negligible sensor capacitance. The rise time of the signal at the output of the 
CSA as given by equation [2] is roughly half than in "low power" mode thanks 
to the larger loop gain. The main consequence is a reduction of the time walk 
of the discriminator, as will be shown in the following. 

The noise of the CSA can be referred to the input as an equivalent noise 



charge (ENC). The detailed noise calculations are given in appendix A.l For 



tr j 0.3 tf, that is for Cj j 10 pF in "low power" mode, the ENC is given by 
ENC * Uj-^ + elcfJ^ + AjCf^^ln * (3) 

V 4 4T F Tfl T F T R J 

where i n is the current noise density, e„ is the white voltage noise density and 
Af is the 1// voltage noise coefficient. In addition to the noise from Ni and 
Rp, it is necessary to consider the noise contributions coming from the bias 
transistor Nb2, whose current noise directly contributes to the parallel noise 
at the input, and Pb5, whose current noise is divided by the transconductance 
of Ni and becomes a series noise contribution at the input. Moreover, if the 
value of the filtering capacitors Cb2 and Cb5 is not large enough, additional 
noise coming from Nbi, Ab 3 and Pba can be injected through Nb2 and Pb5, 
contributing to the parallel and series noise respectively. 

In this first CLARO-CMOS prototype, filter capacitors Cbi and Cb5 are not 
present. The parallel noise is dominated by the channel current of Nbi mirrored 
and multiplied by 10 by Nb2- Since in "low power" mode the transconductance 
of Nbi 1S 9bi = 35 /iA/V we have 

«li = 10 2 x ^kTg B1 ~ (6.2pA/yife) 2 (4) 



3 

Other contributions come from Nb2, about 2 pA/yTLz, and from Rp, about 
0.9 pA/VHi if V F = 1, 1.3 pA/\/H^ if V F = 0, assuming B = 1. The weight of 
the noise generated by Rf is directly proportional to the attenuation factor B: 
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at the maximum attuenuation, that is with B = 10, the noise from Rp becomes 
the dominant parallel noise source with 9 pA/y/Rz if Vf = 1, 13 pA/VT£z if 
Vf = 0. The other noise sources in the CSA do not depend on B, since they 
share the same attenuation as the signal. Anyway the attenuation is meant to be 
used only when the signals are large; so in those cases the signal to noise ratio is 
expected to be anyway adequate. In the following, for all noise evaluations, we 
will consider B = 1. The sum of all parallel noise is thus close to 7 pA/yTlz in 
"low power" mode with B = 1. In "timing" mode the parallel noise increases by 
about 20% due to the larger bias current which gives a larger transconductance 
to N B i and N B 2- 

The series noise is dominated by Ni and Pb5- As already mentioned, ad- 
ditional noise from the other bias transistors is injected through Pgs since its 
gate is not filtered. In "low power" mode, where g\ = 2 mA/V, the scries white 
noise is dominated by Nbi, Nb3 and Pba, which all have a transconductance 
of about gs i = 35 /«A/V. The resulting white voltage noise at the input is 

4i34 = 2 5 2 x 3 x \kT^- ~ (l3 nV/VHz) (5) 
3 g l \ / 

being 25 the area ratio between Pb5 and Pba- Other contributions come from 
N\, about 2.3 nV/VFIz, and from Pbs, about 1.6 nV/vTIz. The sum of all 
series white noise is about e n ~ 14 nV/y/iiz. In "timing" mode the series noise 
reduces by almost a factor of 2, because of the larger transconductance of N\ 
which gives a larger loop gain. Compared to the series white noise, the contri- 
bution of the 1 // component is expected to be negligible since from simulations 
it is possible to estimate Af j 10~ 9 V 2 . 

According to equations[2]and[3j the parallel noise contribution to the ENC at 
the output of the CSA is expected to be about 1.8 ke~ (0.29 fC) at C/ = 3.3 pF, 
and 2.0 ke~ (0.32 fC) at Cj = 6.5 pF. The series noise contribution is ex- 
pected to be about 7.5 ke~ (1.2 fC) at Cj = 3.3 pF, and 12 ke" (1.9 fC) at 
Ci = 6.5 pF. The total noise of the CSA is thus expected to be 7.7 ke _ (1.2 fC) 
at Cj = 3.3 pF, and 12 ke" (1.9 fC) at Cj = 6.5 pF. At the auxiliary output, 
the rise time is limited by the bandwidth of the analog buffer. In that case 
the weight of the series noise is expected to be smaller, while the weight of the 
parallel noise is expected to be larger, according to equation [3| For instance, 
assuming that the output buffer limits the output signals with time constants of 
tr = 1.3 ns and Tp = 7.2 ns, equation [3] gives 5.6 ke~ (0.89 fC) with an input 
capacitance of 3.3 pF, dominated by the series noise. 

As already discussed, the filtering capacitors Cbi and Cb5 can be used to 
improve the noise performance of the design, considerably reducing both the 
series and the parallel noise injected through the bias transistors, at the price 
of a larger layout area on silicon. This improvement will be considered for the 
next versions of the ASIC. 
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Figure 6: Full schematic of the comparator. The width of the MOS transistors 
is shown. The gate length is 0.35 (im for all the transistors except Pjj. The 
substrate of all NMOS (PMOS) transistors is tied to ground (to the positive 
rail) . 



2.2 Design of the comparator 

Figure [6] shows the schematic of the comparator. The input stage is a differential 
pair loaded with a current mirror. This is the only part of the comparator 
which dissipates a continuous current. Since Iq is about 1 /LtA, the differential 
pair is biased with about 100 /xA. The signal from the CSA is connected to 
the inverting input of the comparator, while the noninverting input is held 
at a constant potential which defines the threshold. The threshold voltage at 
the inverting input of the discriminator can be set between 1.25 V (half the 
positive rail voltage) and 0.83 V (one third the positive rail voltage) in 32 steps, 
labelled from to 31, thanks to a 5-bit DAC implemented as a simple voltage 
divider. Each step is about 13 mV. At the maximum gain, this corresponds to 
a threshold step of 150 ke~ (24 fC). 

In ready state, the output of the differential pair is low, and stays close to 
0.5 V. This signal feeds the inverter made of Pg and Ag. Transistor Ng is small 
and has a large threshold, about 0.6 V. In this way N$ is biased just below 
threshold: no current passes through the first inverter and its output is high. 
Transistor Pjj provides hysteresis, and since its gate is high it is switched off. 
The output of Pg and iVg is fed to the second inverter made of Pg and Ag , which 
is also the output stage. 

In response to a negative pulse from the CSA, the output of the differential 
pair goes up, close to the positive rail. The output of first inverter goes to 
ground, closing the switch Pjj, which draws current from the differential pair 
and holds up its output providing hysteresis. At the same time, the output 
of Pg and Ng swings to the positive rail. The gate length of Pjj is large: its 
"on" resistance is about 150 M7, so that only a fraction of the bias current of 
the differential pair passes through P#, and after a few nanoseconds the output 
of the differential pair is able to get back to the initial condition. When the 
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output of the differential pair goes down, the output of the first inverter goes up, 
transistor Ph is opened and the output of the comparator goes down. After this 
the discriminator is ready to trigger another pulse from the CSA. The width of 
the output pulses is proportional to the amplitude of the input signals, allowing 
to apply time over threshold algorythms to determine the input charge and 
compensate for time walk. 

The gain of the input stage of the comparator is about 30 V/V for small 
signals around threshold at low frequency, with a pole at about 30 MHz. The 
corresponding time constant is tq — 5 ns, about the same as the fall time of the 
CSA pulse Tp. The effect of hysteresis is to increase the gain to 600 V/V at low 
frequency. The gain of the inverters is about 20 V/V for each. The overall gain 
of the comparator at low frequency including hysteresis results in 24 x 10 4 V/V 
or 107 dB. Transistor Pg is much larger than Ng, in order to obtain a very fast 
transition on the rising edge at the output. The rise and fall times of the output 
signal depend on the load at the output of the discriminator. The output stage 
was designed to drive only a short line to a digital processing circuit or to an 
external low impedance driver, located a few cm away on the same board. Thus 
a purely capacitive load of a few pF is expected. This was done in order to give 
the maximum flexibility in the design of a full system and to avoid unnecessary 
power consumption in the CLARO-CMOS. The output signal is limited by the 
slew rate of the output stage on the output load, that is Il/Cl, where II 
is the current from the output stage, and Cl is the output load capacitance. 
The output current can be estimated to be II ~ 2.5 Vxg 9 , where gg is the 
transconductance of the output stage. For small signals, the transconductance 
of Pg is 2 mA/V, and that of Ng is 0.8 mA/V, even if this values are largely 
non linear since the output stage swings from rail to rail. Anyway the rise time 
is expected to be about two times smaller than the fall time, since the rising 
edge is driven by Pg while the falling edge is driven by Ng. With these numbers, 
the time required for the full swing from V to 2.5 V at the output is about 
2.5 V/(Il/Cl) — Chjgg- With a load capacitance of Cl — 8 pF, for instance, 
the output 0% to 100% rise time is 4 ns, which corresponds to a 10% to 90% rise 
time of 3.2 ns, and the output 100% to 0% fall time is 10 ns, which corresponds 
to a 90% to 10% fall time of 8 ns. 

The input transistors N% and N-j have a transconductance gc of about 
700 /xA/V, while Pmi and Pmi have a transconductance gu of about 300 (iA/V. 
These are the main contributors to the noise of the comparator. Transistor N B7 
does not contribute because its noise is common mode while the input stage is 
differential. So in the case of the comparator the bias filtering capacitor Cbi 
can be avoided. The input referred white voltage noise density can be expected 
to be 



which together with the 1// contributions corresponds to a voltage noise at 
the input of about 65 \N RMS. Compared with the RMS noise at the output 
of the CSA, that is more than 1 mV RMS in the best case of a 3.3 pF input 
capacitance, this contribution is negligible, at least with the attenuation factor 
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B = 1. With larger attenuations the weight of the noise of the comparator grows 
accordingly, and at B = 10 it becomes significant. Since as already mentioned 
the attenuation is only meant to be used with very large signals, where the signal 
to noise ratio is a minor concern, we will anyway consider the case of B = 1 in 
the following. The jitter on the rising edge of the comparator is expressed by 

The calculations to obtain equation [7] are reported in appendix |A.2| The time 
constant tc — 5 ns is given by the bandwidth of the first stage of the compara- 
tor. When the threshold is set at 300 ke~ (48 fC), equation [7] predicts a jitter 
of 32 ps for 600 ke~ (96 fC) signals, of which 24 ps are due to the series noise, 
and 18 ps to the parallel noise. As for the case of the ENC, the 1// component 
is negligible. According to equation [7j jitter is expected to decrease to 8 ps for 
1.5 Me - (240 fC) signals. For larger signals, equation [7] predicts an unlimited 
improvement; in reality the slope of the signal at the first stage of the discrim- 
inator is also limited by slew rate. So, in contrast with equation [7J jitter is 
expected at some point to stop decreasing for larger signals, and to saturate to 
a constant value. 



3 Performance of the prototype 

Figure [7] shows the signal at the output of the CSA in "low power" mode, 
read out at the auxiliary output through the PMOS follower biased with a 
1 kfl resistor to the positive rail. The gain was set to the maximum value 
(V 3 = Vi = 0, V F = 1), and pulses from 330 ke" (53 fC) to 3.3 Me" (530 fC) 
were injected at the input by a Agilent 81130A 600 MHz step generator through 
a 0.5 pF test capacitance. The 10% to 90% rise time of the test signals is 0.6 ns, 
simulating the typical charge collection time of a fast photomultiplier. The 
output of the PMOS follower was buffered with a Texas Instruments LMH6703 
fast opamp driving a terminated 50 f2 line. The signals were acquired with a 
Agilent DCA-X 86100D 20 GHz sampling scope with the bandwidth limited to 
12 GHz in our measurements. 

The leading edge of the measured analog signal in response to a 330 ke~ 
(53 fC) pulse is 2.8 ns (10% to 90%), its trailing edge is 15.8 ns (90% to 10%), the 
pulse width at 50% is 8 ns. The corresponding time constants are tr = 1.3 ns 
and tf = 7.2 ns. Due to the finite bandwidth of the PMOS follower, the mea- 
sured signal is slower than the signal at the output of the CSA which feeds the 
input of the discriminator. Since the transconductance of the PMOS follower 
is less than 1 mA/V and its bias resistor is 1 kf2, the amplitude of the buffered 
signal is smaller than at the output of the CSA. 

The input noise was obtained by measuring the baseline noise at the auxiliary 
output and referring it to the input of the CSA as an equivalent noise charge 
(ENC). The measured ENC for an input capacitance of 3.3 pF is 6 ke~ (1 fC) 
RMS, consistent with equation which predicts 5.6 ke~ (0.89 fC), as already 
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Figure 7: Signals at the output of the analog buffer (auxiliary output) in "low 
power" mode. 

mentioned, once the correct rise and fall time measured at the output of the 
analog buffer are considered. The importance of low noise is mainly related with 
timing performance, which will be discussed in the following. 

Figure [8] shows the signal at the output of the discriminator when the 
CLARO-CMOS is operated in "low power" mode. The threshold was set at 
6, which at the maximum gain corresponds to 800 kc~ (128 fC), and signals 
from 810 ke~ (130 fC) to 5.6 Me - (900 fC) were injected at the input. This 
range of input signals corresponds to the typical single photon response of a 
photomultiplicr in nominal bias condition. As altready mentioned, the output 
stage of the discriminator is designed to drive a capacitive load of a few pF. 
In these tests the capacitive load at the output was measured to be 8 pF, con- 
tributed by the pads, the QFN48 package, and a short (a few cm) PCB trace to 
a Texas Instruments LMH6703 fast opamp used as a low impedance driver to 
the sampling scope. With this load, the 10% to 90% rise time is 2.2 ns, and the 
90% to 10% fall time is 9.3 ns. The 50% pulse width depends on the amount of 
charge injected at the input, ranging from 7.2 ns for the shortest signal in figure 
[HJ that is just above threshold, to 21.7 ns for the largest signal in figure[8j that 
is almost a factor of 10 above threshold. The delay between the input charge 
pulse and the time when the output of the discriminator reaches 50% is 5 ns for 
signals just above threshold, and lowers to about 2.5 ns for signals well above 
threshold. The delay is due to the rise time of the CSA pulse at the input of the 
comparator and to the difference in the speed of the comparator for different 
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Figure 8: Signals at the output of the discriminator (main output) in "low 
power" mode. 

levels of overdrive. The difference between the two extreme values, about 2.5 ns 
in "low power" mode, constitutes the time walk of the discriminator, which is 
critical for timing performance, to be discussed in the following. 

This performance was obtained in "low power" mode, with an overall contin- 
uous power dissipation per channel of 0.7 mW. If the discriminator is triggered 
with a 10 MHz rate, the average power consumption increases to 1.9 mW per 
channel. It is worth noting that the signals in figures [7] and [8] are acquired at 
the output of the sampling scope: the displayed signals are obtained as the su- 
perposition of dots from several output signals, while the sampling trigger was 
synchronized with the step generator. In this way the figure incorporates at a 
glance also noise and jitter. The output signals shown demonstrate the capabil- 
ity of the CLARO-CMOS to count fast pulses from photomultipliers, from the 
single photoelectron up to larger gains, with a low noise, very high rate (up to 
10 MHz), and a very low power consumption. 

When the prototype is operated in "timing" mode, the power consumption is 
increased to 1.5 mW per channel (rising to 2.3 mW per channel with a 10 MHz 
rate). The difference in the output signals between "low power" and "timing" 
modes are small: the different power consumption affects only the output of 
the CSA, but the difference cannot be directly appreciated on the shape of the 
buffered signals because of the bandwidth limitation of the auxiliary output 
buffer. The differences between the two operating modes can be appreciated on 
the crosstalk and jitter measurements presented in the following. 
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Figure 9: Setup for crosstalk measurement. The capacitance Cxt represents 
the stray capacitance between the pixels of the sensor. 



3.1 Crosstalk 

With very fast circuits, such as the CLARO-CMOS, crosstalk may be critical. 
Fast signals could be capacitively coupled to neighbouring channels through 
parasitic capacitances much more easily than with slower circuits. The level 
of crosstalk between channels was measured as follows. The gain of the victim 
channel was set to the maximum value and its threshold was set at 300 ke~ 
(48 fC). No signal was applied at the input of the victim, while large signals 
were injected at the input of a neighbouring channel. The crosstalk could be 
estimated from the amplitude of the minimum signal which triggers the dis- 
criminator of the victim. To simulate the real case where different pixels of a 
pixellated photodetector are connected to the inputs of the CLARO, a crosstalk 
capacitance Cxt was added between the inputs as depicted in figure [9] The 
input capacitance to ground in this measurement was Cj = 6.5 pF. 

The level of crosstalk was measured with different values of Cxt both in 
"low power" and "timing" modes, and the results are plotted and linearly fitted 
in figure |10| The crosstalk found on chip, that is with Cxt = 0; is negligible. 
Signals up to 10 Me - (1.6 pC) where injected without triggering the victim. 
Increasing the value of Cxt causes the crosstalk to increase correspondingly. 
The measured data were fitted with lines, whose intercept value is compatible 
with zero, confirming that no crosstalk is observed if no capacitance is added 
outside the ASIC between the inputs. The value of Cxt in a given application 
depends on the type of sensor. For instance, the capacitance between the anodes 
of a Hamamatsu R7600 Ma-PMT is less than 0.5 pF. This would translate in a 
crosstalk level below 2% in "low power" mode, and below 1% in "timing" mode. 
A lower level of crosstalk is obtained in "timing" mode thanks to the lower input 
impedance, due to the larger loop gain in the CSA as already discussed. For 
fast readout of pixellated sensors it is mandatory that the parasitic capacitance 
between neighbouring inputs is kept under control. In the cases where the 
capacitance Cxt cannot be reduced due to the characteristics of the sensor, a 
larger Cj should be used. This would affect noise and bandwidth, but would 
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Figure 10: Crosstalk versus crosstalk capacitance Cxt in "low power" and 
"timing" modes. 

help in eliminating crosstalk. 
3.2 Timing resolution 

To evaluate the timing performance of the CLARO-CMOS prototype the gain of 
the CSA was set to the maximum value, and the threshold of the discriminator 
was set at 300 ke~ (48 fC). Since the timing performance is expected to be 
directly proportional to the signal to noise ratio, the use of small input signals 
corresponds to a conservative, worst case scenario. The time resolution of this 
setup was estimated to be 7 ps RMS by directly connecting the Agilent 81130A 
step generator to the Agilent DCA-X 86100D sampling scope. Some of the 
measurements presented in the following reach 10 ps: in these cases the result 
is partially limited by the setup. The setup contribution of 7 ps was subtracted 
in quadrature from the measurements. Moreover, as already mentioned, the 
10% to 90% rise time of the input test signals is 0.6 ns, which is not negligible 
compared to the rise time predicted at the output of the CSA by equation [2] 
in "timing" mode and with a low input capacitance. As expressed by equation 
[7J the timing resolution on the rising edge of the discriminator signal is limited 
by the time contant of the first stage of the comparator tq about 5 ns. Thus 
the contribution of 0.6 ns due to the test signal generator is expected to be 
negligible in the jitter measurements. It may anyway have some impact on the 
effectiveness in time over threshold compensation presented in the following. 

The overall timing performance of a system composed of a sensor and a low 
jitter readout circuit depends also on the precision of time walk compensation; 
otherwise the low jitter would be spoiled by the time walk induced by the 
amplitude spread of the signals coming from the sensor. Figure [TT] shows the 
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Figure 11: Delay versus pulse width in "low power" and "timing" modes. 

dependence of the delay on the pulse width, starting from signals just above 
threshold. The difference in the delay for a given range of input charge is the 
time walk of the discriminator. This is the fundamental curve on which the 
time walk compensation based on time over threshold measurement is based. 
The slope of the fitting lines can be used to estimate the time over threshold 
effectiveness in compensating time walk. To a first order approximation, the 
curves of figure [Tl] do not depend on threshold. The measurements were taken 
both in "low power" mode and "timing" mode. In "low power" mode, as already 
mentioned, the delay ranges from about 5 ns to 2.5 ns, thus the time walk for 
this range of input signals, that is the difference between the two, is 2.5 ns. 
In "timing" mode, as shown in figure |11[ the time walk of the discriminator 
reduces by about a factor of 2. Thus, even if the shape of the output signals 
and the maximum sustainable rate are the same as in "low power" mode, the 
effectiveness of a time over threshold measurement in compensating time walk 
is improved by a factor of 2. 

The measured RMS jitter versus input charge is displayed in figure [12] for 
the "low power" mode. The plot shows the jitter on the rising edge, that is 
113 ps on threshold (about 300 ke~, or 48 fC), decreasing to 34 ps for signals of 
560 ke~ or 90 fC and then reaching 9 ps for large signals (4.5 Me - , or 720 fC). 
The measured values are in a good match with the values predicted by equation 
[7J For larger pulses, the rising edge jitter stops decreasing and saturates to a 
constant value. 

The jitter on the falling edge is larger because the transition is slower. More- 
over, the jitter on the falling edge is affected by a small disturbance which occurs 
on ground when the discriminator triggers. This explains the non-monotonic 
behaviour of the falling edge jitter shown in figure [l2j Anyway, the falling edge 
is only used to compensate time walk: thus the weight of the falling edge jit- 
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Figure 12: Jitter versus input charge in "low power" mode. 



ter on a timing measurement is given by relation between time walk and pulse 
width, that is the slope 7 of the lines used to fit the data in figure [TT] In other 
words, the jitter on the falling edge is normalized according to 

a Fall norm = 7 a Fall ( 8 ) 

where 7 is 0.113 in "low power" mode and 0.055 in "timing" mode, as shown 
in the legend of figure [TT] The jitter on the falling edge normalized with this 
weight is shown in the plot, and is about 100 ps just above threshold, decreasing 
to 13 ps with large signals. The overall timing performance (including time 
walk compensation) is given by the quadratic sum of the rising edge jitter and 
the normalized falling edge jitter, and is shown in the red curve of figure |12| 
going from 135 ps just above threshold to 50 ps at 780 ke~ (125 fC), furtherly 
decreasing to 17 ps with 4.5 Me - (720 fC) signals. 

The same measurements are given in figure [T3| for the "timing" mode. The 
RMS jitter on the rising edge goes from 92 ps just above threshold (300 ke~, or 
48 fC) to 10 ps with large signals (4.5 Me - , or 720 fC). Now the rise time tr of 
the CSA pulse is smaller than in "low power" mode, so the jitter on the rising 
edge is a bit smaller than in "low power" mode, but since the speed is in any case 
limited by the first stage of the discriminator the values are still in agreement 
with the values predicted by equation [7] Since now the time walk compensation 
is twice as effective than before, the normalized jitter on the falling edge goes 
from 44 ps to 6 ps, becoming almost negligible. The overall timing resolution is 
thus 102 ps just above threshold, quickly decreasing below 50 ps above 380 ke~ 
(61 fC), and ultimately reaching 14 ps for 4.5 Me - (720 fC) signals. 
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Figure 13: Jitter versus input charge in "timing" mode. 



4 Conclusions 

The first protototype of the CLARO-CMOS was deeply characterized with a 
particular emphasys on its timing resolution, also considering the effectiveness 
of time walk compensation through time over threshold measurement. The 
prototype performes as expected, proving the adequacy of the design approach 
described. The obtained time resolution down to 10 ps RMS for input charge 
pulses corresponding to single photoelectron signals from a typical photomulti- 
plier is outstanding, considering the very low power dissipation of the prototype, 
below 1 mW per channel. 
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A Calculations 



A.l Noise 

In the complex frequency domain the transfer function of the CSA of figure [3] 
can be written as 

TF cgA (s )= -f— (9) 

SUf (1 + ST F j (1 + STr) 
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where s — iuj is the complex frequency, Tp = CfRf, and tr = Cijgig\Rc as 
given by equation [2j The response to a delta-like pulse QS(t) is obtained by 
multiplying equation [9] by Q and taking the inverse laplace transform, which 
gives equation [T] A white current noise density i n at the input is converted to 
a voltage noise at the output which is given by 

Vo i (s) = ^ r ,. . (10) 

sCf (1 + STp) (1 + STflJ 

A voltage white noise density e„ at the input can be converted to its Norton 
equavalent, that is a current noise density sC/e n . The corresponding voltage 
noise density at the output is 

V 0e (s) = e n -^--— S ^— r (11) 

C F (1 + STp) (1 + STr) 

and the same happens for a voltage low frequency noise Af/f, which gives 

Voa{s) = -f^TTTV YfTT \ ( 12 ) 

/ C F (1 + STp) (1 + STr) 

To obtain the squared RMS noise at the output, one must integrate the squared 
amplitudes of equations 10 11 and 12 in cku /2?t over the whole frequency spec- 
trum. Equation [T0| gives 

i 2 r°° t 2 dt<] i 2 t 2 

V Oi RMS C 2j o {1+L0 2 T 2 ){l+UJ 2 T 2 R)27r C 2 pA{TF+TR) W 

equation [Tl] gives 

r 2 r 00 I,-, 2 !- 2 rim r 2 ^ 2 

t/2 _ 2 W / u T F auJ _ J2 U J 



Oe RMS C « C 2y o (l +w 2 r 2 )(1+CJ 2 T 2 )27r 4 i^TpTR -\- TpT^) 

(14) 



and equation |12| gives 

"Oi RMS = A ! A' I 7T—, 27TT-r— JITT 11 -- = - 1 / 77T . : > _ .,_•_> 



C* io (1 + ufr*) (1 + u 2 t 2 ) dUJ - Af C 2 F T 2 - r 2 M r fl 



(15) 

If one lets tr —¥ Tp the above expressions reduce to the known expression for a 
RC-CR filter [IB]. In our case tr < 0.3tp, so we can approximate expanding to 
the first order in tr/tp. Equation 13 becomes 



V 2 „ il t f - t r 

v Oi RMS — ~r<2 A ^ LK> > 



Cf 



equation [14] becomes 

T/2 ~ , 

^ n C 2 At f t r 



V Oe RMS — e n7^ „ U'J 
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and equation 15 becomes 



V 2 

v OA RMS 



A f~^2 ln — 



(18) 



Summing together equations 16 17 and 18 one obtains the total squared RMS 
noise at the output: 



v o RMS 



C% 



+ Cn C F 4t f t r 
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In 



TR 



(19) 



The square root of equation [19] gives the total RMS noise at the output of the 
CSA. To obtain the noise referred to the input as ENC one must calculate 
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where Vq max(Q) is the peak output voltage for a charge Q, which can obtained 
from equation [l] and is 
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which expanding for small t r /tf becomes 
Vo MAX(Q) 
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(21) 
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tf 
(22) 

where the expression was approximated using the fact that x x ~ 1 + x In x for 
small x, and all the terms in t r /tf with power equal or higher than 2 were 
dropped. Equation [22] can be furtherly approximated by 

-l 
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O MAX 



(Q) 



C F 



1 



} TR 
TF 



(23) 

For t r < 0.3tf, 
For instance, for 



For t r -C t f , both equations give V Q max(Q) = Q/Cf 
equation 23 approximates equation [22] within a 10% error 
t r = Tp/e equation 22 gives 0.63 Q/Cf, while equation [23] gives 0.58 Q/Cf- 
The approximation of equation [23] is based on the fact that the coefficient 2 is 
the closest integer to e/(e — 1), obtained by imposing the values of equations 22 

1/e 



and 23 to be equal for t r /tf 



0.3. From equations 19 20 and 23 



obtains the expression for the squared ENC, that is 
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which to the first order in t r /tf can be also written as 
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By taking the square root of equation [25] we obtain equation [3] 
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A. 2 Jitter 



To calculate the impact of the noise of the CSA on the timing resolution of the 
discriminator, one must consider the overall transfer function of the CSA and of 
the first stage of the comparator when it is triggering, that is when the voltage 
at the two inputs is almost equal. Neglecting hysteresis the transfer function 
of the first stage of the comparator in the complex frequency domain can be 
modelled as G/(l + stc). By combining this with equation [9] we obtain the 
transfer function of the whole chain from the CSA input to the discriminator 
output, which gives 

TF T0T ( S )^-L G (26 ) 

sGp 1 + ST C (1 + STp) 

where equation [9] was approximated for tr ~ 0, since bandwidth is now limited 
by Tc, which is expected to be larger than tr at least for small values of the 
input capacitance C/. As in the case of the squared RMS noise at the output 



of the CSA alone, which was given by equations 13 14 and 15 we can calculate 
the squared RMS noise at the output of the first stage of the discriminator. For 
a current noise source i n we obtain 

-2 2 -2 

t/2 _ % n n 2 T F l n r ,2 T C / 07 \ 

Voi RMS - c& 4(tf + Tc) - c§ G Y (27) 



for the voltage white noise 



T/2 _ „2 °7 r 2 T_F 2 °J r 2 1 / 9 jA 

Voe RMS " Cn^rG ^——^ _ e n ^G — (28) 

and for the voltage low frequency noise 

C 2 t 2 t f C 2 1 

V OA RMS = A f-F^2 G 3 ~Z2 ln ~ - A fVn G o ( 29 ) 

L^p Tp — T c Tc Lsp L 

where the expressions were approximated for tc — Tp. The sum of these gives 
the total RMS noise at the output of the first stage of the discriminator. To 
obtain the corresponding timing resolution, one must divide the voltage noise 
by the slope of the signals at the output: 

Vo RMS / Q n\ 
ffRi " = W = t T „) (30) 

where t^H is the time when the second stage of the discriminator triggers the 
signal. By multiplying equation [26] by the input charge in excess of threshold, 
that is Q — Qth, then computing its inverse Laplace transform and differenti- 
ating it with respect to time, one obtains 

W) - Q—Qhlq — te. — (l±_£±\ 

Cf Tp - T C \ T C Tp 
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which, considering that Tp ~ tq, becomes 

tt/ f,\ Q - Qth ^tc - t __t_ 

VoW = n G — — e c (32) 

Assuming that the second stage of the discriminator triggers for t <C tq equation 
[32] gives 

W = PrH) = ^£™° (33) 

Op T C 

By plugging equation [33] together with equations [27j [28] and [29] into equation 
[30] one obtains 



^(Q^^ + ^ + ^'t) <34) 



By taking the square root of equation [34] one obtains equation [7] 
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